Group 11: Analysis of Prostate Cancer Data

Introduction

Prostate cancer, a disease that significantly impacts the lives of men, requires a thoughtful exploration to refine diagnostic approaches and treatment regime Understanding the complexities of this condition is crucial, emphasizing the importance of applying advanced data analysis techniques to patient data. Extracting meaningful insights from extensive datasets not only deepens our understanding of prostate cancer but also equips healthcare professionals with valuable information to personalise patient-by-patient Our goal is to delve into the connections, patterns, and predictive models associated with prostate cancer and patient outcomes, ultimately enhancing our ability to provide more personalized and effective care for individuals facing this challenging diagnosis.

Materials and Methods

Our analysis exploits data from a randomised clinical trial by Byar & Greene that compares treatment of patients with prostate cancer in stages 3 and 4. Treatment consisted of different doses of diethylstilbestrol (DES). Data are publicly available in : https://hbiostat.org/data/repo/prostate.xls The initial dataset contains information related to 502 observations of patients with prostate cancer across 18 variables. These variables encompass diverse information including patient demographics, medical history, treatment received, and health status.The raw data were loaded + augmented + described + modelled. and the process of arriving at results is done in a reproducible manner.For instance we separate “rx” into three columns; “Treatment regime”, “mg” and “Drug”.

Data exploration

Data exploration

Results: Logistic regression modelling

Results: Principal Component Analysis (PCA)

For Principal Component Analysis (PCA), three primary steps were undertaken. Initially, the data was examined in PC coordinates, followed by an analysis of the rotation matrix. Finally, emphasis was placed on understanding the variance explained by each Principal Component

Discussion

The PCA analysis is used to look for groupings and patterns of the data based on all the appropriate variables PCA analysis did not work particularly well for this data, since just over 25% of the total variance is captured by the first principal component (PC1) , which is pretty low. Some data sets inherently require multiple principal components to represent different aspects of the variability, as is the case here.